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Abstract 

Background: Maternal-fetal genotype incompatibility (MFGI) is increasingly reported to influence human diseases, 
especially pregnancy-related complications. In practice, it is challenging to identify the ideal incompatibility model for 
analysis, since the true MFGI mechanism is generally unknown. The underlying MFGI mechanism for different genetic 
variants can vary, and to use a single incompatibility model for all circumstances would cause power loss in testing 
MFGI. 

Results: In this article, we propose a practical 2-step procedure that incorporates a model selection strategy based 
on an entropy measurement to select the most appropriate MFGI model represented by data and test the 
significance of the MFGI effect using the chosen model within the generalized linear regression framework. 

Conclusions: Our simulation studies show that the proposed two-step procedure controls the type I error rate and 
increase the testing power under various scenarios. In a real data application, our analysis reveals genes having an 
MFGI effect, which may not be detected with a non-model selection counterpart. 

Keywords: Complex disease, Pregnancy complications, Association study, Maternal-fetal genotype incompatibility 



Background 

Current advances in high-throughput biotechnology have 
popularized genome-wide association studies (GWAS) to 
detect genetic variants that increase the risk of com- 
plex diseases. Over the past decade, thousands of single 
nucleotide polymorphisms (SNPs) have been reported to 
be associated with various human diseases. Despite the 
numerous successes of GWAS, the majority of heritabil- 
ity for many complex diseases remains unexplained [1-5]. 
Recent genomic research provides compelling evidence 
that the cause of complex human diseases is multifactorial 
and involves both genetic and environmental factors. The 
lack of consideration of sophisticated components like 
gene-gene interactions, gene-environment interactions, 
and epigenetic functions can lead to the missing heritabil- 
ity for most common diseases. 
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The underlying genetic architecture can be espe- 
cially complicated for diseases developed during human 
pregnancy, since both maternal and fetal genomes are 
involved. In general, the fetus inherits one copy of the 
genome from each of its parents, and the two copies are 
not identical. Previous family-based or twin studies indi- 
cated that the heritability for obstetric diseases is high. 
For example, it is reported in an earlier twins study that 
heritability was 17% for preterm delivery in first preg- 
nancy and 27% for preterm delivery in any pregnancy 
[6] and heritability range of 25%-40% was suggested 
for birthweight and gestational length in another study 
[7]. Maternal and fetal genes, either individually or in 
combination, could increase the risk of diseases such as 
hemolytic disease of the newborn [8], preterm birth [9,10], 
small for gestational age [11], pre-eclampsia [12-14], and 
preterm prelabor rupture of membranes (pPROM) [15]. 
The incompatibility between maternal and fetal geno- 
types, in which the expression of genes from two gen- 
erations lead to an opposite effect, plays a vital role and 
can increase the risk of these diseases. However, most 
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current association studies on obstetric diseases have pri- 
marily focused on only one genome for susceptibility 
genomic loci; that is only the maternal or fetal genome was 
searched for associated genetic factors when a maternal or 
fetal disorder was studied. 

Evidence support the important role of interaction 
between maternal and fetal genes, more than maternal 
genes alone for the etiology of pregnancy complications, 
are accumulating [16-19]. In other words, an increased 
risk of certain disorders could be due to a specific combi- 
nation of maternal and fetal genotypes. The mother and 
fetus share only one allele. Mismatches between mater- 
nal and fetal genotypes may lead to adverse effects when 
a fetus resides in utero and increase the risk of disease. 
A good example of this deleterious effect comes from 
the allogenic response. If a bi-allelic locus has a null 
allele and an antigen-coding allele, the mother is homozy- 
gous for the null allele, and the fetus inherits an allele 
from the father which codes for an antigen, the mother 
may produce an allogenic response to the fetal antigen, 
which is harmful to the fetus. This type of incompatibility 
between maternal and fetal genotypes is well illustrated by 
Rh incompatibility, which is developed when a pregnant 
woman is Rh-negative (d/d) and the fetus is Rh-positive 
(D/d) in the RhD locus. Red blood cells from the fetus 
can cross into the maternal blood stream through the pla- 
centa. The maternal immune system treats Rh-positive 
fetal cells as external attacks and makes antibodies against 
the fetal blood cells. These antibodies may cross back 
into the developing fetus and destroy its circulating 
blood cells, which can cause hemolytic disease of the 
newborn (HDN). Therefore, the identification of genes 
with maternal-fetal genotype incompatibility (MFGI) by 
searching parental and offspring genomes simultaneously 
is highly recommended [20-26]. 

Study designs in which data are collected from parent- 
offspring triads or mother-offspring dyads are the most 
commonly used to investigate the marginal and joint 
effects of maternal and fetal genes. Most currently avail- 
able statistical approaches for analyzing this type of data 
fall in the framework of generalized linear regression 
models. Maternal fetal genotype tests based on the log- 
linear modeling for child-parent triads have been devel- 
oped [20,27-29]. These tests are robust to population 
stratification because they compare the distribution of 
affected and unaffected individuals given the parental 
mating type instead of comparing frequencies of alle- 
les/genotypes between cases and controls. However, these 
tests and their extensions require at least some paternal 
data are available. For situations when paternal data are 
100% missing, the dyad sampling data, methods based on 
logistic regression models were proposed [22,23,25]. 

Although it has been widely hypothesized that mis- 
matches between maternal and fetal genotypes can cause 



incompatibility, the underlying biological mechanism 
remains unclear. Therefore, it is challenging to appro- 
priately model incompatibility and code the correspond- 
ing variable accordingly. That is, suppose a variable G/ c 
denotes the MFGI effect, it is problematic to decide when 
to code the variable as 1 or 0. Parimi et al. [30] evaluated 
the performance of 6 plausible incompatibility models and 
concluded that the most comprehensive model, which 
codes genotype incompatibility whenever maternal and 
fetal genotypes are different, consistently outperformed 
other models. However, only the maternal-fetal incompat- 
ibility effect was simulated in their study, and the maternal 
main effect and the fetal main effect were not considered 
along with MFGI. When a maternal or fetal main effect 
co-exists with MFGI, this approach dramatically inflates 
the type I error. Even if only an incompatibility effect is 
present, the recommended model does not always achieve 
greater power than the true incompatibility models. 

In this study, we developed a 2-step statistical strat- 
egy for testing MFGI effects in designs that collect data 
from the mother and offspring that can increase the test- 
ing power under a wide range of scenarios. We propose 
to select the MFGI model based on an entropy measure- 
ment via a permutation procedure; then we test the MFGI 
effect using the selected incompatibility model within the 
logistic regression framework. 

Methods 

Genetic model 

Consider a study that enrolls case and control mother- 
offspring pairs from a target population. Collected data 
include genotypes of mothers and offspring, disease phe- 
notype (phenotype of mother or child) of interest, and 
other covariates with a total of n independent mother- 
offspring pairs (no controls and n\ cases, n\ + no = ri). 
Let G m and G 0 denote the maternal and fetal genotypes 
of a particular SNP, respectively. Under the commonly 
used additive genetic model, G m / 0 = 0, 1, or 2 if the 
mother/offspring has 0, 1, or 2 copies of the minor allele. 
Let Y = (yi,j2> • • • >Jn) T denote the vector of the phe- 
notype, where yt is the dichotomous disease outcome of 
the i th family unit in the sample, in which yi = 1 or 0 
corresponds to the affected or unaffected individuals. 

Consider a bi-allelic genomic locus with 2 alleles: 
A and a, where A denotes the rare allele. Follow- 
ing the Mendelian inheritance, there are seven possi- 
ble maternal-fetal genotype combinations (see Table 1). 
The 4 mismatched maternal-fetal genotype combinations 
are denoted as Mi,M2,M3, and M4. It is possible that 
any of the mismatched maternal-fetal genotype combi- 
nation leads to incompatibility or that only a specific 
mismatched genotype combination or a certain collec- 
tion of these genotype combinations is associated with 
the risk of disease. Therefore, in the absence of evidence 
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Table 1 Possible maternal-fetal genotype combinations 



Go 


G m 


AA(2) 


AaO) 


aa(0) 


AA(2) 


0 


M] 




AaO) 


M 2 


0 


M 3 


oo(0) 




Ma 


0 



from molecular genetics analysis, it is challenging to 
determine which incompatibility model fits the biological 
mechanism. Here, we consider 11 biologically-plausible 
incompatibility models (Table 2) and propose a 2-step 
procedure to identify genomic loci that have a MFGI 
effect on a disease outcome of interest. We first select an 
MFGI model based on an entropy measurement and then 



Table 2 Biologically plausible models of maternal-fetal 
genotype incompatibility 



Model 


GC 


G m 


Go 


Scenario 


1 


Mi 


AA 


Ao 


Mother has 1 more copy of allele A than 
the heterozygous offspring 


2 


M 2 


Ao 


AA 


Offspring has 1 more copy of allele A 
than the heterozygous mother 


o 
-J 


lvl 3 


An 


aa 


IVIULllcl lldS I IbK allele r\ II Id I LNc 

offspring does not 


4 


Ma 


aa 


Ao 


Offspring has risk allele A that the 
mother does not 


5 


M] 


AA 


Ao 


Mother-offspring pair has 3 copies of A 
allele 




M 2 


Ao 


AA 




6 


M] 


AA 


Ao 


Mother has 1 more copy of A allele 




M 3 


Ao 


00 




7 


M] 


AA 


Ao 


Offspring has an allele that the mother 
does not 




Ma 


00 


Ao 




8 


M 2 


Ao 


AA 


Mother has an allele that the offspring 
does not 




M 3 


Ao 


00 




9 


m 2 


Ao 


AA 


Offspring has 1 more copy of the A 
allele 




Ma 


00 


Ao 




10 


M 3 


Ao 


00 


Mother-offspring pair possesses 3 
copies of allele o 




Ma 


aa 


Ao 




11 


M] 


AA 


Ao 


All possible mismatched maternal-fetal 
genotype combinations 




M 2 


Aa 


AA 






M 3 


Aa 


aa 






Ma 


aa 


Ao 





A: Minor allele; 

GC: Genotype combination. 



test the statistical significance of MFGI using the chosen 
incompatibility model Details of the 2-step procedure are 
described in the following section. 

Statistical model 

The information theory, which was initially developed in 
the 1940s [31] to quantify the transmission of information 
in communication channels within a rigorous mathemat- 
ical framework, has gained much attention in genetic 
association studies in recent years [32-35]. Our aim is to 
propose a model selection strategy to choose the MFGI 
model best represented by the data using the entropy the- 
ory. Before introducing the model selection strategy, we 
discuss some basic concepts about the information theory. 
Entropy measures the uncertainty of a random variable. 
For a discrete random variable X, entropy is defined as: 

d 

H(X) = -J>(X = Xi)log h PiX = *i) U) 

i=l 

where xu P(X = Xi),i = 1, 2, • • • , d are the possible val- 
ues of X and their corresponding probabilities; b is the 
base of the logarithm and is commonly assumed to be 2 in 
the information theory. We propose the following 2-step 
procedure to test MFGI effects: 

Step 1: Select the MFGI model Let p and 1 — p be pro- 
portions of cases and controls, respectively, in a given data 
set. Entropy of the disease outcome can be computed 

H(D) = -plog 2 (p) - (1 - p)log 2 a - p) (2) 

This entropy serves as a measure of the uncertainty of 
disease outcome in the initial data set. 

Under each of the 11 plausible MFGI models listed in 
Table 2, the mother or offspring can be characterized as 
"high risk" or "low risk" based on their genotype combina- 
tions. For example, under Model 1, mother-offspring pairs 
with genotype combination M\ = (AA,Aa) are consid- 
ered "high risk" and other combinations are considered 
"low risk". The high and low risk labels split the initial 
data set into 2 subsets. Entropy of disease outcome within 
each subset, H(D\risk = high) and H(D\risk = low), can 
be calculated using Equation (2). The conditional entropy 
of disease status, given a particular MFGI model, is then 
defined as 

H(D\MFGI) = H(D\risk = high)P(risk = high) 

+ H(D\risk = low)P(risk = low) (3) 

This conditional entropy measures the remaining 
amount of uncertainty of disease outcome given the MFGI 
model. The difference between this conditional entropy 
and the original entropy is the information gain (or mutual 
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information), which reflects the amount of information 
that a certain MFGI model provides (Equation (4)). 

IG(D; MFGI) = H(D)-H(D \MFGI) (4) 

To adjust for the uncertainty of disease status due 
to sampling, the information gain ratio was used 
(Equation (5)) as the criterion to select the optimal model 
to code the MFGI effect. 

H(D\MFGI) 

R = IG(D;MFGI)/H(D) = 1 - y ^ (5) 

As shown in Table 2, Model 11 is the most com- 
prehensive model because it includes all 4 incompati- 
ble maternal-fetal genotype combinations. The study by 
Parimi et al. (2008) recommends this model as "optimal" 
when decoding the MFGI effect. Herein we consider this 
model as the default model. The information gain ratio 
was calculated for each of the 11 plausible MFGI mod- 
els and, then we selected the model that has the largest 
information gain ratio as the candidate model. Since a 
candidate model could be chosen by chance and does 
not reflect the real functional mechanism, a permuta- 
tion procedure is used to assess how likely the candidate 
model will be chosen under the assumption of no genetic 
association as follows: 

1. Obtain the information gain ratio 

{Ri, i = 1, 2, • • • , 11} for each model and identify the 
model with the maximum information gain ratio 
pmax _ max {R lf R 2f • • • ,R\i} as a candidate model; 

2. For b = 1, 2, • • • , B, permute the disease label and 
obtain the maximum information gain ratio 
R™ ax = max{R hb ,R 2 , b ,---,Rn, b }; 

3. Calculate the empirical p-value of selecting the 
model by chance 

1 B 

p - value = - J2 J ( R b aX > RmaX ) 
b=i 

If the obtained empirical P-value is less than a pre- 
defined cutoff r (say r = 0.0001), we can conclude that 
the candidate model was not selected by chance and will 
be used as the analysis model in the next step of testing. 
Otherwise, Model 11 will be used as the analysis model. 

Step 2: Test the MFGI effect Once an optimal incom- 
patibility model is selected, it will be used to code the 
incompatibility effect in a logistic regression model to 
assess the significance of the incompatibility effect, that is, 

logitP(r = l|G m , Go) = P+P m G m +PoG 0 +PicG ic (6) 

where G m and G 0 represent the maternal and offspring 
additive variables, respectively, which are coded as 0, 1, 



or 2 corresponding to aa, Aa, and AA, respectively, where 
A is the risk allele; and G/ c is the variable of MFGI. The 
value of Gi c depends on the selection result from Step 1. 
For example, if Model 1 is selected as the analysis model, 
then Gi c = 1 for mother-offspring pairs with genotype 
combination (AA,Aa) and G/ c = 0 otherwise. Testing the 
MFGI effect corresponds to testing the null hypothesis 
Ho : pi c = 0. The likelihood ratio test was applied for this 
purpose. 

Simulation 

To demonstrate that the proposed approach is valid in 
controlling the type I error rate and that it is statisti- 
cally powerful, we conducted a series of simulations under 
the null and alternative hypotheses. Genotypes of N = 
1, 000, 000 families (parents and a child) were generated in 
a population assuming symmetric mating and Mendelian 
transmission of alleles. Parental genotypes were gener- 
ated by multinomial distribution with a pre-specified 
genotype frequency. Either the Hardy- Weinberg equilib- 
rium (HWE: minor allele frequency = 0.2) or the Hardy- 
Weinberg disequilibrium (HWD: genotype frequency = 
(0.18, 0.47, 0.35) for homozygous carriers, heterozygotes, 
and noncarriers of the minor allele, respectively) was 
assumed. Fetal genotypes were simulated based on par- 
ents' genotypes following Mendelian inheritance. Pater- 
nal data were then dropped to mimic the maternal-fetal 
study design. Binary phenotypes were simulated based 
on a quantitative liability variable Z = (z\,Z2, • • • ,Zn) T > 
where Z{ denotes the liability variable of the i th subject. A 
threshold was determined to ensure that disease preva- 
lence remained at 5%. Mother-offspring pairs with the 
underlying quantitative liability that exceeded the thresh- 
old were "diagnosed" as affected and others as unaffected. 
Simulated data were treated as a population. Then sam- 
ples with the size n were randomly taken for subsequent 
analysis. 

The underlying quantitative liability trait was simulated 
through the following regression model (Equation (7)), 

z = a + a m G m + a 0 G 0 + a; c G; c + s (7) 

where as are defined the same way as /3s in Equation (6). 
Without loss of generality, we set the overall mean a = 0 



Table 3 Simulation scenarios with different parameter 
values 



Scenarios 


1 


II 


III 


IV 


V 


VI 


VII VIII IX 


Pm 


0 


0.4 


0 


0.4 


0 


0.2 




Po 


0 


0 


0.4 


0 


0.4 


0.2 




Pic 


0 


0 


0 


0.4 


0.4 


0.4 





h 2 0.05 0.10 0.15 
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and a 2 = 1. Performance of our proposed two-step 
approach (called the model selection approach) was com- 
pared with that of its non-model selection counterpart 
(called the full model approach). Quantitative data were 
generated using a particular MFGI model listed in Table 2, 
called the data generating model Various scenarios were 
considered (Table 3): Scenario I assumes no genetic effect 
at all; Scenarios II and III generate data under the null 
hypothesis of no MFGI effect while allowing maternal 
or fetal main effect; Scenarios IV- VI simulate the MFGI 
effect along with maternal and/or fetal main effects; and 
Scenarios VII-IX assume the MFGI effect only at 3 dif- 
ferent heritability levels (h 2 = 0.05, 0.10, 0.15). The effect 
size of incompatibility was computed as described by 
Parimi et al: let Oj = af c q(l — q) + o 2 where q is the 
proportion of incompatible maternal-fetal genotypes in 
the simulated population. For a given heritability level h 2 , 
we can calculate the incompatibility effect through the 
equation h 2 = 1 — a 2 /a^. 

A case study 

We illustrated the proposed method via an application 
to a sub-analysis of a broader candidate gene study that 
investigates the role of genetic factors on the risk of com- 
plications of pregnancy. Details of this sub-study have 
been previously published in a genetic association study 
[15]. Briefly, this case-control study includes patients with 
preterm prelabor rupture of membranes (pPROM) and 
their neonates and control mothers with a normal preg- 
nancy and their neonates. Patients of Hispanic origin 



were enrolled in a research protocol at the Sotero del Rio 
Hospital, Santiago, Chile. 

pPROM occurs in 396-4.5% of pregnancies in the United 
States and is responsible for about 30% of preterm 
births [15]. Although previous studies have suggested 
the presence of predisposing genetic factors for pPROM 
[9,10,36,37], the underlying genetic architecture remains 
unclear. SNPs in 190 candidate genes were selected and 
genotyped based on their possible biological roles in 
obstetrical diseases. We analyzed phenotypic and geno- 
type data from the study to determine whether incompati- 
bilities between the maternal and fetal genotypes increase 
the risk of pPROM. Six samples were removed because of 
large proportion of missing genotypes (> 50%) in either 
the mother sample or the offspring sample. Also, when 
searching across SNP markers, samples that did not follow 
Mendelian inheritance were excluded from the analysis. 
Our analysis included 742 SNPs in 190 candidate genes for 
721 mother-offspring pairs (case-control ratio = 136:585). 
Maternal age which has been previously shown to be sta- 
tistically significant [15] was included in the model to 
adjust its effect. The proposed 2-step procedure and the 
full model approach were used to analyze data. Table 4 
presents results of the analysis.The permutation proce- 
dure was handled a bit differently in the model selection 
step in this analysis: we calculated the maximum informa- 
tion gain ratio at all genomic loci across the genome for 
each permutation, that is, 742 values for 1 permutation; 
and the maximum information gain ratios for 20 permu- 
tations (a total of 742 x 20 = 14840 values) were collected 



Table 4 List of SNPs with maternal-fetal genotype incompatibility effect associated with pPROM at a = 0.005 



Gene 


Region 


rs Number 


P-value 1 


P-value 2 


MS 


OR* 


95% CI 


MGP 


promoter 


rs1 800801 


0.0006 


0.0404 


5 


0.4175 


[0.2343,0.7438] 


MMP14 


exon 5 


rs2236302 


0.0014 


0.0051 


3 


2.8013 


[1.6398,4.7854] 


COL5A2 


exon 48 


rs6434312 


0.0017 


0.0045 


10 


0.5370 


[0.3502,0.8233] 


ANGPT2 


intron 6 


rs2979671 


0.0020 


0.1450 


1 


0.2820 


[0.0968,0.8216] 


ANGPT2 


exon 4 


rs3020221 


0.0020 


0.0259 


1 


0.2826 


[0.0947, 0.8434] 


TNFRSF1A 


intron 4 


rs1 800692 


0.0022 


0.0968 


7 


2.1064 


[1.3307,3.3342] 


AQP2 


exon 4 


629722653 


0.0027 


0.0271 


2 


2.8062 


[1.4721,5.3495] 


CRHR1 


intron 7 


rs 16940668 


0.0038 


0.0038 


11 


1.7365 


[1.1605,2.5986] 


COL1A2 


intron 46 


rs1 3240759 


0.0041 


0.0041 


9 


0.5305 


[0.3404, 0.8265] 


GJA4 


exon 2 


rs 1764389 


0.0044 


0.0044 


11 


1.6831 


[1.1005,2.5740] 


HLA-E 


exon 3 


rs 1264457 


0.0046 


0.6216 


8 


0.5468 


[0.3491,0.8566] 


ILW 


intron 4 


rs5743627 


0.0048 


0.0048 


11 


1.3354 


[0.8149,2.1885] 


COL4A2 


intron 33 


rs41315048 


0.0049 


0.0101 


10 


0.5709 


[0.3718,0.8767] 



1 : P-value obtained using the 2-step approach; 

2 : P-value obtained using the full model approach; 

*Odds ratio for the MFGI effect; 

MS: Model selection result; 

OR: Odds ratio; 

CI: Confidence interval. 
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Table 5 Type I error for testing the MFGI effect under simulation Scenarios l-lll 









Scenario 1 




Scenario II 


Scenario III 


HWE/D* 


Model 


n = 500 


n = 1000 


n = 500 


n = 1000 


n = 500 


n = 1000 


HWE 


Full model 


0.0566 


0.0512 


0.0475 


0.0540 


0.0527 


0.0490 




Model selection 


0.0566 


0.0512 


0.0475 


0.0540 


0.0527 


0.0492 


HWD 


Full model 


0.0511 


0.0470 


0.0492 


0.0529 


0.0536 


0.0477 




Model selection 


0.0511 


0.0470 


0.0492 


0.0537 


0.0536 


0.0483 



*: HWE: Hardy-Weinberg equilibrium; HWD: Hardy-Weinberg disequilibrium. 



and used to obtain empirical P-values. This reduces the 
computational time and allows us to address the multiplic- 
ity issue. A cut-off value of r = 0.05 was used in the model 
selection step because we try to find as many true pos- 
itives as possible, although the chance that we make the 
type I error may be slightly inflated when maternal and/or 
fetal main effects co-exist with the MFGI effect. 

Results 

Simulation results 

To assess the type I error rate, we simulated the phenotype 
under the null hypothesis of no MFGI effect. Specifically, 



data were generated under Scenarios I-III with sample 
sizes of 500 and 1000. Empirical type I error rates were 
estimated as the proportion of simulations with P-value 
less than 0.05 across 11,000 replicates. Overall, the test 
size was well controlled at the nominal level (0.05) for both 
approaches under all scenarios we considered. The esti- 
mates of type I error rate for the model selection approach 
relies on the cutoff value r used in the model selection 
step. According to our simulations, the empirical type I 
error rate exceeds the nominal level slightly under scenar- 
ios II and III, where either maternal or fetal main effect 
was simulated, when a loose cutoff value of r = 0.05 was 



Statistical Power 




Generating Models 

Figure 1 Statistical power estimates of the proposed model selection approach (solid) and the full model approach (dashed) for 
Scenarios IV-IX (as given in Table 3) assuming HWE with sample sizes of 500 (left) and 1 000 (right) using 1 000 replicates. 
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used, the obtained empirical type I error rate is around 
0.06 (detailed data not shown here). As the cutoff value 
gets more stringent, the obtained empirical type I error 
rates approaches to the nominal level. Table 5 presents 
results of type I error rate obtained with r = 0.0001, 
which are controlled at the nominal level. The subsequent 
power estimates were also based on r = 0.0001. As shown 
in Table 5, the type I error rate for our model selection 
approach are the same as that for the full model approach 
under most scenarios. This is because the model selec- 
tion step almost always chooses the full model (Model 
11) when there is no incompatibility effect, leading to the 
same analysis model for both approaches. There was no 
significant effect of HWD on type I error. Estimates of the 
type I error rate for scenarios under HWD are comparable 
to those for scenarios under HWE. 

Figures 1 and 2 display statistical power estimates for 
the proposed model selection approach and the full model 
approach for testing MFGI. The testing power for our 
model selection approach was generally higher than that 
for the full model approach under all the scenarios con- 
sidered. This improvement was more striking for larger 



sample sizes. For scenarios that assume HWE, when 
the MFGI effect was simulated together with maternal 
and/or fetal main effects (Scenarios IV- VI), our method 
improved the power, particularly when the true incom- 
patibility model was Model 5. For example, our model 
selection approach had a power of 0.631 whereas the full 
model approach only had a power of 0.126 to detect the 
true MFGI effect when Model 5 was used to generate 
data with a sample size 1000 under Scenario IV (top right 
panel of Figure 1). When only the MGFI effect was sim- 
ulated (Scenarios VII-IX), our model selection approach 
increased the testing power, especially when the under- 
lying true incompatibility model was Model 1, 2, or 5 
(bottom panels of Figure 1). The increase in testing power 
results from the model selection step, which can choose 
the true data generating model. The estimated probabil- 
ity of the underlying incompatibility model being selected 
as the analysis model by our approach approaches 1 with 
a heritability level of 0.1 or above. With a lower heri- 
tability level of 0.05, the estimated probability of selecting 
true model deceases, especially for scenarios under HWD 
(right panel of Figure 3). Although improvements in the 
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Figure 2 Statistical power estimates of the proposed model selection approach (solid) and the full model approach (dashed) for 
Scenarios IV-IX (as given in Table 3) not assuming HWE with sample sizes of 500 (left) and 1 000 (right) using 1 000 replicates. 
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Figure 3 Proportions of the simulations that select the true data generating model (black portion) for scenarios VII-IX (from top to 
bottom) under HWE (left panel) and HWD (right panel) based on 1000 replicates. 



testing power for HWD scenarios were not as striking as 
those in HWE scenarios (Figure 2), the performance of our 
2-step approach was still better than that of the full model 
approach. 

Data analysis results 

Table 4 summarizes results of the pPROM data anal- 
ysis for the 2 approaches. It is evident from the table 
that our 2-step approach identified MFGIs that could be 
missed by the full model approach. For example, a P- 
value of 0.002 was obtained for both SNPs (rs2979671 
and rs3020221) in the intron 6 and exon 4 regions of the 
gene ANGPT2 by using our proposed approach. However, 
the P-values of 0.1450 and 0.0259 were obtained for SNPs 
rs2979671 and rs3020221, respectively, by using the full 
model approach. Model 1 was selected as the incompat- 
ibility model for SNP rs2979671 in ANGPT2. SNPs with 
an odds ratio (OR) less than 1 showed protective effects 
with the defined genotype incompatibility combinations 
(Table 4). Here, OR refers to the ratio of odds of develop- 
ing pPROM in the two risk groups defined by the selected 
MFGI model. For example, SNP rs2979671 in ANGPT2 
had an OR of 0.282, which implies that individuals with 
the mother offspring paired genotype combination (A/A, 



G/A) have a lower likelihood than other genotypes of 
developing pPROM. Such protective effects were also 
observed for SNPs identified in genes MGP, COL5A2, 
COL1A2, HLA-E, and COL4A2 (ORs and CIs shown in 
Table 4). 

In comparison, SNPs identified in genes MMP14, 
TNFRSF1A, AQP2, CRHR1, and GJA4 had OR greater 
than 1, indicating that a high risk of pPROM is possible 
with the mother-offspring pairs who have certain geno- 
type incompatibility combinations defined by the corre- 
sponding selected incompatibility models. For example, 
SNP rs2236302 in the exon 5 region of gene MMP14, 
mother-offspring pairs who have the genotype combina- 
tion (C/G, C/C) are at higher risk of developing pPROM: 
33 of the 104 mothers in the defined "high risk" group 
developed pPROM whereas only 99 of 611 mothers in the 
"low-risk" group developed pPROM (OR = 2.8013, 95% 
CI = [1.6398, 4.7854]). The confidence interval of the OR 
for SNP rs5743627 in gene IL10 covers 1, indicating that 
the MFGI effect is not marginally significant. As we are 
aware of, this is the first analysis that have been done 
which specifically investigates the genotype incompatibil- 
ity effect between maternal and fetal gene that underlying 
pPROM. We believe that our analysis results are helpful 
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for generating hypotheses for future studies or wet lab 
validations. 

Discussion and conclusions 

The importance of maternal-fetal genotype incompati- 
bility in human diseases, particularly in obstetrical com- 
plications, was first discussed in the 1990s [38] and has 
been studied intensively in recent years [16-19,23,24,26]. 
Most of the currently available statistical methods for 
identifying MFGI effects fall in the framework of general- 
ized linear regression [20-22,25,30]. Since the underlying 
MFGI mechanism is unknown and may vary for different 
genetic variants, it is challenging to appropriately model 
the incompatibility effect. The complexity largely relies 
on the underlying competition of 3 sets of genes: the 
maternally-derived fetal gene, the paternally-derived fetal 
gene, and the untransmitted maternal gene [39] . Conflict 
among the 3 sets of genes may result in an incompatibility 
effect, which may adversely lead to pregnancy complica- 
tions such as pPROM. 

A commonly used approach is to code the incompat- 
ibility effect whenever there is a disagreement between 
maternal and fetal genotypes [30]. However, our simula- 
tion studies show that this simple treatment ignores the 
underlying disease gene action modes and has potential 
drawbacks. When maternal and/or fetal main effects exist, 
the method increases the false-positive rates for incom- 
patibility detection. Rather than predefining an incompat- 
ibility model, herein, we propose a strategy to select an 
optimal incompatibility model that captures the under- 
lying disease gene function. A model is selected as a 
candidate model if its entropy-based measurement is the 
maximum among all possible incompatibility models via 
a permutation procedure. The candidate model is then 
chosen as the analytical model for further statistical tests 
to assess the incompatibility effect along with the mater- 
nal/fetal main genetic effects. 

Intuitively, our approach will boost the statistical power 
by adding a MFGI model selection step. The power gain 
results from the fact that the true underlying incom- 
patibility model can be selected most of the time with 
enough samples. We conducted extensive simulation 
studies, considering the effect of heritability, assump- 
tion about HWE, sample size and different disease gene 
functions. The results indicate that the proposed 2-step 
strategy works well when the underlying truth is unknown 
compared with the full model approach. Our approach 
controls the type I error rate at the nominal level and 
achieves higher power than the full model approach 
without performing incompatibility model selection. Our 
approach does not pose strong assumptions, and its per- 
formance is quite consistent under settings such as HWE 
or HWD, with or without maternal and/or fetal main 
effects. 



We applied the 2-step approach to study maternal-fetal 
genotype incompatibility effects associated with pPROM 
and identified several interesting SNPs. Our findings pro- 
vide clues about the biological mechanism through which 
MFGI in these genes may have an adverse or protective 
effect on pPROM. Our results can be used to gener- 
ate hypotheses for future biological validations to study 
pathogenesis of pPROM. 

Overall, this method can be applied to study the 
maternal-fetal genotype incompatibility component of 
obstetrical complications, such as preeclampsia and other 
human diseases in which maternal and fetal genetic fac- 
tors interact and increase the risk of disease. 
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